124 research outputs found

    A Greedy Algorithm for Unimodal Kernel Density Estimation by Data Sharpening

    Get PDF
    We consider the problem of nonparametric density estimation where estimates are constrained to be unimodal. Though several methods have been proposed to achieve this end, each of them has its own drawbacks and none of them have readily-available computer codes. The approach of Braun and Hall (2001), where a kernel density estimator is modified by data sharpening, is one of the most promising options, but optimization difficulties make it hard to use in practice. This paper presents a new algorithm and MATLAB code for finding good unimodal density estimates under the Braun and Hall scheme. The algorithm uses a greedy, feasibility-preserving strategy to ensure that it always returns a unimodal solution. Compared to the incumbent method of optimization, the greedy method is easier to use, runs faster, and produces solutions of comparable quality. It can also be extended to the bivariate case

    A Greedy Algorithm for Unimodal Kernel Density Estimation by Data Sharpening

    Get PDF
    Nonparametric methods for smoothing, regression, and density estimation produce estimators with great shape flexibility. Although this flexibility is an advantage, the practical value of nonparametric methods would be increased if qualitative constraints—natural-language shape restrictions—could also be imposed on the estimator. In density estimation, the most common such constraints are monotonicity (the density must be nondecreasing or nonincreasing) and unimodality (the density must have only one peak). The work presented here takes unimodal kernel density estimation as a representative problem in constrained nonparametric estimation. The method proposed for handling the constraint is data sharpening. A greedy algorithm is described for achieving the unimodality constraint. The algorithm is deterministic and runs quickly. It can find solutions that are competitive with the incumbent method, sequential quadratic programming

    Methods for Shape-Constrained Kernel Density Estimation

    Get PDF
    Nonparametric density estimators are used to estimate an unknown probability density while making minimal assumptions about its functional form. Although the low reliance of nonparametric estimators on modelling assumptions is a benefit, their performance will be improved if auxiliary information about the density\u27s shape is incorporated into the estimate. Auxiliary information can take the form of shape constraints, such as unimodality or symmetry, that the estimate must satisfy. Finding the constrained estimate is usually a difficult optimization problem, however, and a consistent framework for finding estimates across a variety of problems is lacking. It is proposed to find shape-constrained density estimates by starting with a pilot estimate obtained by standard methods, and subsequently adjusting its shape until the constraints are satisfied. This strategy is part of a general approach, in which a constrained estimation problem is defined by an estimator, a method of shape adjustment, a constraint, and an objective function. Optimization methods are developed to suit this approach, with a focus on kernel density estimation under a variety of constraints. Two methods of shape adjustment are examined in detail. The first is data sharpening, for which two optimization algorithms are proposed: a greedy algorithm that runs quickly but can handle a limited set of constraints, and a particle swarm algorithm that is suitable for a wider range of problems. The second is the method of adjustment curves, for which it is often possible to use quadratic programming to find optimal estimates. The methods presented here can be used for univariate or higher-dimensional kernel density estimation with shape constraints. They can also be extended to other estimators, in both the density estimation and regression settings. As such they constitute a step toward a truly general optimizer, that can be used on arbitrary combinations of estimator and constraint

    A Genetic Algorithm for Selection of Fixed-Size Subsets with Application to Design Problems

    Get PDF
    The R function kofnGA conducts a genetic algorithm search for the best subset of k items from a set of n alternatives, given an objective function that measures the quality of a subset. The function fills a gap in the presently available subset selection software, which typically searches over a range of subset sizes, restricts the types of objective functions considered, or does not include freely available code. The new function is demonstrated on two types of problem where a fixed-size subset search is desirable: design of environmental monitoring networks, and D-optimal design of experiments. Additionally, the performance is evaluated on a class of constructed test problems with a novel design that is interesting in its own right

    Ligand Binding and Crystal Structures of the Substrate-Binding Domain of the ABC Transporter OpuA

    Get PDF
    Background: The ABC transporter OpuA from Lactococcus lactis transports glycine betaine upon activation by threshold values of ionic strength. In this study, the ligand binding characteristics of purified OpuA in a detergent-solubilized state and of its substrate-binding domain produced as soluble protein (OpuAC) was characterized. Principal Findings: The binding of glycine betaine to purified OpuA and OpuAC (KD=4–6 µM) did not show any salt dependence or cooperative effects, in contrast to the transport activity. OpuAC is highly specific for glycine betaine and the related proline betaine. Other compatible solutes like proline and carnitine bound with affinities that were 3 to 4 orders of magnitude lower. The low affinity substrates were not noticeably transported by membrane-reconstituted OpuA. OpuAC was crystallized in an open (1.9 Å) and closed-liganded (2.3 Å) conformation. The binding pocket is formed by three tryptophans (Trp-prism) coordinating the quaternary ammonium group of glycine betaine in the closed-liganded structure. Even though the binding site of OpuAC is identical to that of its B. subtilis homolog, the affinity for glycine betaine is 4-fold higher. Conclusions: Ionic strength did not affect substrate binding to OpuA, indicating that regulation of transport is not at the level of substrate binding, but rather at the level of translocation. The overlap between the crystal structures of OpuAC from L.lactis and B.subtilis, comprising the classical Trp-prism, show that the differences observed in the binding affinities originate from outside of the ligand binding site.

    Large-scale electron microscopy database for human type 1 diabetes

    Get PDF
    Autoimmune β-cell destruction leads to type 1 diabetes, but the pathophysiological mechanisms remain unclear. To help address this void, we created an open-access online repository, unprecedented in its size, composed of large-scale electron microscopy images ('nanotomy') of human pancreas tissue obtained from the Network for Pancreatic Organ donors with Diabetes (nPOD; www.nanotomy.org). Nanotomy allows analyses of complete donor islets with up to macromolecular resolution. Anomalies we found in type 1 diabetes included (i) an increase of 'intermediate cells' containing granules resembling those of exocrine zymogen and endocrine hormone secreting cells; and (ii) elevated presence of innate immune cells. These are our first results of mining the database and support recent findings that suggest that type 1 diabetes includes abnormalities in the exocrine pancreas that may induce endocrine cellular stress as a trigger for autoimmunity

    Exome sequencing in patient-parent trios suggests new candidate genes for early-onset primary sclerosing cholangitis

    Get PDF
    BACKGROUND & AIMS Primary sclerosing cholangitis (PSC) is a rare bile duct disease strongly associated with inflammatory bowel disease (IBD). Whole-exome sequencing (WES) has contributed to understanding the molecular basis of very early-onset IBD, but rare protein-altering genetic variants have not been identified for early-onset PSC. We performed WES in patients diagnosed with PSC METHODS In this multicentre study, WES was performed on 87 DNA samples from 29 patient-parent trios with early-onset PSC. We selected rare (minor allele frequency <2%) coding and splice-site variants that matched recessive (homozygous and compound heterozygous variants) and dominant (de novo) inheritance in the index patients. Variant pathogenicity was predicted by an in-house developed algorithm (GAVIN), and PSC-relevant variants were selected using gene expression data and gene function. RESULTS In 22 of 29 trios we identified at least 1 possibly pathogenic variant. We prioritized 36 genes, harbouring a total of 54 variants with predicted pathogenic effects. In 18 genes, we identified 36 compound heterozygous variants, whereas in the other 18 genes we identified 18 de novo variants. Twelve of 36 candidate risk genes are known to play a role in transmembrane transport, adaptive and innate immunity, and epithelial barrier function. CONCLUSIONS The 36 candidate genes for early-onset PSC need further verification in other patient cohorts and evaluation of gene function before a causal role can be attributed to its variants.Peer reviewe

    Multilevel latent class casemix modelling: a novel approach to accommodate patient casemix

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Using routinely collected patient data we explore the utility of multilevel latent class (MLLC) models to adjust for patient casemix and rank Trust performance. We contrast this with ranks derived from Trust standardised mortality ratios (SMRs).</p> <p>Methods</p> <p>Patients with colorectal cancer diagnosed between 1998 and 2004 and resident in Northern and Yorkshire regions were identified from the cancer registry database (n = 24,640). Patient age, sex, stage-at-diagnosis (Dukes), and Trust of diagnosis/treatment were extracted. Socioeconomic background was derived using the Townsend Index. Outcome was survival at 3 years after diagnosis. MLLC-modelled and SMR-generated Trust ranks were compared.</p> <p>Results</p> <p>Patients were assigned to two classes of similar size: one with reasonable prognosis (63.0% died within 3 years), and one with better prognosis (39.3% died within 3 years). In patient class one, all patients diagnosed at stage B or C died within 3 years; in patient class two, all patients diagnosed at stage A, B or C survived. Trusts were assigned two classes with 51.3% and 53.2% of patients respectively dying within 3 years. Differences in the ranked Trust performance between the MLLC model and SMRs were all within estimated 95% CIs.</p> <p>Conclusions</p> <p>A novel approach to casemix adjustment is illustrated, ranking Trust performance whilst facilitating the evaluation of factors associated with the patient journey (e.g. treatments) and factors associated with the processes of healthcare delivery (e.g. delays). Further research can demonstrate the value of modelling patient pathways and evaluating healthcare processes across provider institutions.</p
    corecore